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Abstract: A model-assisted semiparametric method of estimating finite population totals is in- 
vestigated to improve the precision of survey estimators by incorporating multivariate auxiliary 
information. The proposed superpopulation model is a single-index model which has proven to be 
a simple and efficient semiparametric tool in multivariate regression. A class of estimators based 
on polynomial spline regression is proposed. These estimators are robust against deviation from 
single-index models. Under standard design conditions, the proposed estimators are asymptoti- 
cally design-unbiased, consistent and asymptotically normal. An iterative optimization routine is 
provided that is sufficiently fast for users to analyze large and complex survey data within sec- 
onds. The proposed method has been applied to simulated datasets and MU28f dataset, which 
have provided strong evidence that corroborates with the asymptotic theory. 

Key words and phrases: Horvitz- Thompson estimator; model-assisted estimation; semiparametric; 
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1 Introduction 

In this article, the classic finite-population estimation problem is investigated. In what follows, 
let Un = {1, iV} denote the N units of finite population. For each i G Un, let jji 
be a generic characteristic and the objective is to estimate t y = J2i^u N yi- ^ probability 
sample s is drawn from Un according to a fixed sampling design pn (•), where pn (s) is the 
probability of drawing the sample s. Let ttin = vrj = Pr{i £ s} = Yl S 3iPN ( s ) denote the 
inclusion probability for element i £ Un and WijN = ftij = Pr{£,j £ s} = Yls3ijPN ( s ) denote 
the inclusion probability for element Un- 

If no information other than the inclusion probabilities is used to estimate t y , a well-known 
design unbiased estimator is the Horvitz-Thompson estimator 

tyn = ty = — • (1-1) 

ids 1 

The variance of the Horvitz-Thompson estimator under the sampling design is 

Var p (ty) = faj ~ ^f 1 ' 

i,jeU N 1 3 
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The efficiency of the Horvitz-Thompson estimator can be significantly improved by in- 
corporating some "cheap" auxiliary information at the population level in addition to sample 
data. Such auxiliary information is often available for all elements of the population of interest 
in many surveys. For instance, in many countries, administrative registers provide extensive 
sources of auxiliary information. Complete registers can give access to variables such as sex, 
age, income and country of birth. Studies of labor force characteristics or household expendi- 
ture patterns, for example, might benefit from these auxiliary data. Another example is the 
satellite images or GPS data used in spatial sampling. These data are often collected at the 
population level, which are often available at little or no extra cost, especially compared to the 
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cost of collecting the survey data. For more examples of auxiliary information, see 
Use of auxiliary information to improve the accuracy of survey estimators actually dates back 
to post-stratification, calibration, ratio and regression estimation; see @, 28, 32] for a general 
review of these methods. Auxiliary information can also be used to increase the accuracy of 
the finite population distribution function, for example, [s^]. 

In this article, let Xj = {xa, ...,Xid} be a d-dimensional auxiliary variable vector, i G Un, 
and assume that {(xj, Ui)} ie jj N is a realization of (X, Y) from an infinite superpopulation, £, 
satisfying 

Y = m(X) +a(X)e, (1.2) 

in which the (i-variate function m is the unknown mean function of Y conditional on the 
auxiliary information vector X, often is assumed to be smooth; a is the unknown standard 
deviation function. The standard error satisfies that E^ (e |X) = and Eg [e 2 |X) = 1, where 
Eg is the expectation with respect to the population £. The interesting problem is how to take 
advantage of the regression relationship (|1.2j) to better estimate t y . 

The traditional parametric approach to analyze a regression relationship assumes that the 
superpopulation model is fully described by a finite set of parameters, for example, the linear 
regression estimator discussed in 28|. However, it sometimes requires prohibitively complex 
models with a very large number of parameters to address various hypotheses. It is very dif- 
ficult to obtain any prior model information about the regression function m in (jl.2p . and 
substantial estimation bias can result if a preselected parametric model is too restricted to 
fit unexpected features. As an alternative one can try to estimate the unknown regression 
relationships nonparametrically without reference to a specific form. The flexibility of non- 
parametric smoothing/regression is extremely helpful in exploratory data analysis as well as 
in obtaining robust predictions, see 0, [3] for details. 

Nonparametric methods for survey data are rather sparse and have begun to emerge as 
important and practical tools, see [l|, B, S, [27], 38, 3- Reference [l] first proposed a nonpara- 
metric model-assisted estimator based on local polynomial regression, which generalized the 
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parametric framework in survey sampling and improved the precision of the survey estimators. 
Their investigation is restricted to the scalar case, i.e., d = 1. Nowadays most surveys involve 
more than one auxiliary variable (reference 221] )• For example, the auxiliary information ob- 
tained from remote sensing data, satellite images and GPS data provide a wide and growing 
range of variables to be employed in spatial sampling. Northeastern lakes survey discussed in 
0, 14] is a good example of this. In that study, a lot of information, such as longitude, lati- 
tude, and elevation, of every lake in the population is known for the Environmental Monitoring 
and Assessment Program (EMAP) of the U.S. Environmental Protection Agency (EPA). In 
addition, the growing possibilities of information and communication technology have made it 
possible to develop very large and complex surveys. In this article, a d-dimensional auxiliary 
vector is considered to improve the efficiency of estimating t y for both small and large surveys. 

Research in nonparametric survey theory and methodology when the dimension of the 
auxiliary information vector is high, however, is quite challenging. A key difficulty is due 
to the issue of "curse of dimensionality": the optimal rate of convergence decreases with 
dimensionalit y ( 311 ] ) . One solution is regression in the form of additive model popularized by 



201 ]; see [2], y, [27( for possible application of additive model to survey sampling. A weakness of 



the purely additive model is that interactions between the explanatory variables are completely 
ignored ([30j). An attractive alternative to additive model is the single- index model given in 



(|2.ip . Similar to the first step of projection pursuit regression, single-index model reduces 
dimensionality but does not incorporate interactions; see 1 19 . l2ll . l36j ] for instance. The 

basic appeal of single-index model is that it is in nature a hybrid method of parametric and 
nonparametric regression. It preserves the simplicity of parametric regression where simplicity 
is sufficient: the d-variate function m(x) = m (x±, Xd) is expressed as a univariate function of 
x T #o = Y2q=i x q@o,q'i it also employs the flexibility of nonparametric regression where flexibility 
is necessary. 

In this article, I investigate the single-index model-assisted estimator for the finite pop- 
ulation total, that is, the superpopulation model in (|1.2j) is assumed to be a SIM. Under 
standard design conditions, a design-consistent estimator of 0$ has been obtained using poly- 
nomial splines, and the proposed estimator of t y is asymptotically design-unbiased, consistent 
and asymptotically normal. By taking advantage of the spline smoothing and iterative op- 
timization routines, the proposed method is particularly computationally efficient comparing 
to the kernel additive model approaches in the literature of nonparametric survey estimation, 
in which iterative approaches such as a backfitting algorithm ( [3, l20(] ) or marginal integration 
( 2a]) are necessary. The rest of the article is organized as follows. Section [2] gives details of 
the model specification and the proposed method of estimation. Section [3] describes some nice 
properties of the estimator. Section d] provides the actual procedure to implement the method. 
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Section [5] reports the empirical results. All technical proofs are contained in Appendices A and 
B. 



2 Superpopulation Model and Proposed Estimator 
2.1 Single-index superpopulation model 

In this article, the proposed superpopulation model £ in (|1.2|) is a single-index model (SIM), 
where 

Y = m (X T 6 ) + a (X) e, (2.1) 

where the unknown parameter 9q is called the single- index coefficient, used for simple inter- 
pretation once estimated; function m is an unknown smooth function used for further data 
summary. 

If the SIM is misspecified, however, a goodness-of-fit test is necessary and the estimation of 
#o must be rethought; see [35j |. So in this article, instead of presuming that the underlying true 



function m is a single-index function like the one defined in (J2JJ), the single-index is identified 
by the best approximation to the multivariate function m. Specifically, a univariate function 
g is estimated that optimally approximates the multivariate function m in the sense of 

g(v) = Eg[m(X.)\X T 9 = v\. (2.2) 

The superiority of this method is that it works very well even under model misspecification so 
that it is much more useful in applications than the traditional SIMs given in (|2.ip . 
For the superpopulation model defined by (11. 2} and (|2.2p . let 



m e (X T 9) = Eg [Y\X. T 9] = Eg[m(X)\ X. T 9] 

for any fixed 9, where as noted in the introduction, Eg denotes the expected value with respect 
to the population £ in (| 1 . 2 j) and (|2.2p . Define the risk function of 9 as 



R(9) = Eg \{Y-m e (X T #)} 2 ] = Eg{m(X) -m e (X T 9)} 2 + Ega 2 (X), (2.3) 

which is uniquely minimized at 6>o € S*^ 1 = j(#i, Qd) I Ylq=i ®q = 1> @d > j. 
Remark 2.1: It is obvious that without constraints, the coefficient vector 9$ is identified only 
up to a constant factor. Typically, one requires that ||#o|| = 1 which entails that at least one 
of the coordinates #o,i> •••) #o,d is nonzero. One could assume without loss of generality that 
9qcL > 0, and the candidate 9q would then belong to the upper unit hemisphere S"? -1 . 
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2.2 Spline smoothing 

Estimation of both 9q an d g(-) in model (|2.2p requires a degree of statistical smoothing. In 
this article, all estimation is carried out via polynomial splines. The use of polynomial spline 



smoothing in the generalized nonparametric models can be back to 3l|. As pointed out in 
0, 34], one of the important advantages of spline smoothing is the relative ease with which 
spline estimators can be simply computed, even for large datasets or datasets with regions of 
sparse data. In addition, spline smoothing is a global smoothing method. After the spline basis 
is chosen, the coefficients can be estimated by an efficient optimization procedure. In contrast, 
kernel based methods such as the kernel based backfitting (0, 20]) and marginal integration 



approaches ([25]), in which the maximizing has to be conducted repeatedly at every local data 
points, are very time-consuming. 

To introduce the function space of splines of order p, one pre-selects an integer iV 1 / 6 <C J = 
Jn "C iV 1 / 5 (log iV)~ 2 / 5 , see Assumption (A4) below, and divides [0, 1] into (J + 1) subintervals, 
[kj, fcj+i), j = 0, J — 1, [k j, 1], where {kj}j =1 is a sequence of equally-spaced points, called 
interior knots, given as 

ki-p = ... = fc_i = k = < fci < ... < kj < 1 = k J+1 = ... = k J+p , 

in which kj = j/(J + 1), j = 0, 1, J + 1. The j-th B-spline of order p denoted by Bj tP is 



recursively defined by 10]. In the following, let $^ = $^ 2 ^ [0, 1] be the space of all the second 
order smoothness functions that are polynomials of degree 3 on each subinterval. 

Direct calculation shows that under Assumption (Al) in Section 13.21 for any 9 G S^T 1 , 
the variable ~K T 9 has a Lebesgue probability density function (pdf) that is uniformly bounded 
below and above by the pdf of a rescaled centered Beta {(d + 1) /2, (d + 1) /2}, 

r i \ r(d+l) /.. o I 2\(<2~l)/2 r / \ 

K ' T{(d+l)/2} 2 2 d a V ' ' 1 * JV ; 

which vanishes at boundary points —a and a. This makes nonparametric smoothing of Y on 
X T difficult. I therefore first transform the variable X T 9 by using the cumulative distribution 
function F<i of fd 

Fd (y)= r(d+1) 2 - (1 - t 2 ) ( ^ 1)/2 dt,ve[-a,a). (2.4) 

7-i r{(d + i) /2} 2 2 d y ' 1 J v ' 

For the rest of the article, denote the transformed variable of the single-index variable X T 9 
by Zq and let (pg be the conditional expectation of m given the transformed variable Zg, i.e. 

pg (Zg) = Es {m (X) \Zg] = {m (X) \X T 9} = m e (X T 9) , (2.5) 
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Remark 2.2: The transformed variable, Ze, has a quasi-uniform [0, 1] distribution, i.e., the 
pdf of the transformed variable is supported on [0, 1] with positive lower bound. In practice, the 
radius a can take the value of the 100(1 — a) percentile of {||xj||} ig(7jv , for example, a = 0.05. 

2.3 "Oracle" population-based estimator 

If the entire realization were known by "oracle", one can create an "oracle" estimator to 
estimate 9q an d g in (|2,2p through a profile least-squares method. One first estimates the 
single-index coefficient 9q by a consistent estimator via minimizing the empirical version of 
the risk function R {9) defined in (|2.3p . i.e. 

6 = arg min R {9) . (2.6) 

where 

R(6) = N- 1 £ {yi-w(z ei )} 2 , (2.7) 
ieU N 

and 



(p B (•) = arg min V {y { - ip {z ei )} 2 . (2. 



Then the link function g can be estimated by g, a cubic spline smoother of {in} on {^}, i.e., 
g (y) = (fg (Fd {v)), where Fa (•) is defined in (12. 4p . Thus the best single-index approximation 
to m(x) is m(x) = g (*- T (?j = <P§ { z §}- 

Let y = (yi,V2, ■-,yN) T , B e = {B jA {z d i)} ieUN >j= _ 3> j be the B-spline matrix for any 
fixed 6 and ej be a TV-vector with a 1 in the ith. position and elsewhere. Write 

m = g (xfe) = <p~ Q ( % ) = e?B~ e (BTB e -) ~* Bj y. (2.9) 

Clearly, fhi is the spline single-index prediction at Xj based on the entire finite population. If 
these pseudo predictions fhi were known, then a design-unbiased estimator of t y would be the 
generalized difference estimator 

w = e^t^+ E™- ( 2 - 10 ) 

as given on page 221 of reference 0]. The design variance of t y ,diff i n (|2.10p is 
Var p (t y> m) = ^2 ~ ni7T ^ 



i,jeU N J 
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2.4 Sample-based estimator 

However, the predictions fhi for m (xj) can not be computed directly from data, because the 
only yiS observed are those with i £ s. Therefore, each fhi needs to be replaced by a sample- 
based consistent estimator. For any fixed 9, the sample based cubic spline estimator (p$ of (pe 
in (|2.5p is defined as 

^(•) = arg min V vr^ 1 { Vi - p (z ei )} 2 . (2.11) 
¥>(-)6-£ (2) [o,i]f^ 

Define the sample-based empirical risk function of 9 

R (9) = N- 1 £ iVi - & ( z ei)} 2 » ( 2 - 12 ) 

then the sample design-based spline estimator of 9q is defined as 

9 = arg min R(9) , (2.13) 

ees'l' 1 

and the spline estimator of g is g, i.e., g (v) = 0§ (F^ (v)). For any i € s, let 

= 5 (xf 9) = S (z §i ) = ejB § s (bT W„B*, s ) ~* BT W s y s , (2.14) 
where y s = {yijies * s the ^iv-vector of yi obtained in the sample and 
B *, s = {%4 (%)}^ = _3,..., J5 W s = diag jl} 
Then the sample design-based B-spline estimator of t y is 

■ h 2^ rrii- (2.15) 

its % iel/jv 

3 Properties of the estimator 

3.1 A simple alternative expression for the estimator 

Like the ratio and linear regression estimators ( [28f] ) and the penalized spline estimators ([2]), 
the B-spline estimator defined in (|2.15p can also be represented in a simple form. Let t z and 
t Z7T be two vectors: 

E b oa (%) 1 , u = |e ^ rl %4 (%) } 
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Then the estimator in (|2.15p can be written as i^diff = ty+ (tz — tzn) 7> where 



Noting that (1, 1) J+4 B T §g = (1, l) njy and 

T 



E^ lfl ^y = (l-->l)j + 4BjW s B 9 -, s , 

> i=-3,...,J 



ids 



one has 



t OT 7 = ^E^,4(%) (BTw s B e g BjW s y s 

J i=-3,...,J 



(1, 1) J+4 Bj W s y s = (1, 1) W s y s = i y . 



1 ' J+4 vv sys — \>~, -U rajv w s^s — i-y 

So the proposed estimator takes the simple and attractive form: i^diff = t«7 = ^Ciec/jy ^i- 



3.2 Assumptions 

I will use the traditional asymptotic framework given in reference , in which both the 

population and sample sizes increase as N — > oo. There are two sources of "variation" to be 
considered here. The first is introduced by the random sample design and the corresponding 
measure is denoted by p. The "with p-probability 1", u O p " , "o p " and "E p (-)" notation below 
is with respect to this measure. The second is associated with the superpopulation from which 
the finite population is viewed as a sample. The corresponding measure and notation are , 
"withe-probability 1", "O e ", "o ? " and "#£(■)"• 

Before stating the asymptotic properties of the estimators, we need some assumptions. 
Let B% = {x G i? d |||x|| < a} be the d-dimensional ball with radius a, center and volume 
VoU(B*). Let 

(b%J = |m the A;-th order partial derivatives of m are continuous on B% j 

be the space of k-th order smooth functions. Before stating the asymptotic results, I formulate 
some assumptions: 

(Al) The density function of X, / (x) G (B% ) for some a > 0, and there are positive 
constants Cf < Cf such that c//Vol d (B^) < / (x) < Cf/Yo\ d {B^), if x G B% and 
/(x) = 0,x£i# 

(A2) The regression function in m G C (4) (Bf) . 



8 



(A3) The error e in satisfies £?^(e|X) =0, Eg (e 2 |X) = 1 and there exists a posit 



we 



constant M such that sup xeB d E^ \ \e\ |X = xj < M. The standard deviation function 
a (x) is continuous on B^, < c a < inf xeB d a (x) < sup xeB d a (x) < C a < oo. 

(A4) As N — > oo, n^N^ 1 —> it € (0, 1) and i/ie number of interior knots Jn satisfies: -C 
J^«n^ /5 {log (n^)}~ 2 / 5 . 

(A5) For all N, minjg^ vr^ > A > 0, nuiLi t j^u N TTy > A* > and 

lim sup h-tv max |7Tj,- — vrj7Tj| < oo. 

(A6) Let Dk t N be the set of all distinct k-tuples (ii,i2, ■ from Un, 

lim sup n 2 N max \E p [(J^ - 7%) (Jj 2 - vr i2 ) (J i3 - TT i:j ) (J i4 - 7T i4 )]| < 00, 

AT— »oo (u,«2,«3,«4)eD 4j]v 



lim sup n 2 ^ >>.<..-, j^,,, 



max |£-p [(/jj^/jj ^1112) (-^13-^14 7^1314 < 

(«l,«2,i3,«4)G-D 4l JV 



and 



lim sup max 

iV^OO (u,t2,»3)eI>3,JV 



(iil - VTjJ 2 (Jj 2 - TT i2 ) (Ij 3 - 7T i3 ) 



< OO, 



< OO, 



where Ii = 1 if i £ s and Ii = otherwise. 



(A7) The risk function R in ( 2.7 ) is locally convex at 9: Ve > 0, 35 > such that 
if R(9) - R (9) < 5. 



< e 



(A8) The second order partial derivative of the risk function, R(9), is bounded at 9 = 9. 

Remark 3.1: Assumptions (A1)-(A3) are typical in the nonparametric smoothing literature, 
see for instance, la, lla . l36l |. Assumptions (A4) is about how to choose the number of knots 



in order to achieve the optimal nonparametric rate of convergence. In practice the number of 
interior knots Jn is chosen according to (|4.2p . Assumptions (A5) and (A6) involve the inclusion 
probabilities of the design, which are also assumed in reference [l|. Assumption (A7) is used 
to derive the design consistency of 9 to 9 and Assumption (A8) is used to obtain the rate of 
the consistency. 

3.3 Asymptotic properties of the estimator 



The estimator 9 in (|2.13p of the single-index coefficient #0 is asymptotically design consistent 
as the following theorem demonstrates. 
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Theorem 1. Under Assumptions (A1)-(A5) and (A7), 9 is asymptotically design consistent 
in the sense that with p-probability 1 



lim i 

N^oo 



0. 



and further if (A 8) holds, then 



O p [JN/n]( 2 



where 9 and 9 are the population and sample based estimators of 9q in \2. 6\) and \2.13\) . 

Like the local polynomial estimators in [l| , the following theorem shows that the estimator 
t y , diS hi (|2.15p is asymptotically design unbiased and design consistent. 

Theorem 2. Under Assumptions (Al)-(A5) and (A7)-(A8), the model assisted spline estima- 
tor ty^iff in A2.15\) is asymptotically design unbiased (ADU) in the sense that 



lim E v 

N->oc y 



N 



with ^-probability 1, 



and is design consistent in the sense that for all rj > 



lim E v 



{\ty,d,ff-ty\>Nri} 



with ^-probability 1. 



Like the local polynomial estimators in [l| , the following theorem shows that the estimator 
in (|2.15j) also inherits the limiting distribution of the generalized difference estimator. 

Theorem 3. Under Assumptions (A1)-(A8), for t y diff and t y ^iff in A2.10\) and \2. 15\) . 



N~ 



l V,diff ' 



ty) 



Var)! 2 (N~% >dlff ) 



as N — > oo implies 



N~ 



l y,diff ' 



ty) 



where 



v(n-H 



VW (N~%, dlff ) 
1 



TV (0,1) 



AT (0,1) 



N 2 



i,jeU N 



7T, 



Details of the proofs of Theorems [T]l3] are given in Appendix B. 



(3.1) 



Remark 3.2: In reference [2|, the number of knots is fixed, thus the bias caused by spline 
approximation in developing the asymptotic theory is ignored. It has been shown in many 
contexts of function estimation that, by letting the number of knots increase with the sample 
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size at an appropriate rate, spline estimate of an unknown function can achieve the optimal 

1 /6 

nonparametric rate of convergence; see [22J, |34J]. For this purpose, in this article, <C Jn -C 



n 1 ^ 5 {log (tin)} 2 ^ 5 , as shown in Assumption (A4). 

Remark 3.3: As one one referee pointed out, the asymptotics with the number of knots 
allowed to grow is much more challenging, and only very recent work tackles this problem, 
e.g. HQ. However, the results obtained in this article are not directly comparable to those 
obtained in j], due to different settings of the model. The problem in [9, 24] is a purely 
nonparametric curve estimation problem and the objective is to study the asymptotics of the 
curve estimators fitted with penalized splines. While the problem here is a semi-parametric 
one and the main interest is in estimating the parametric component 9. At the population 
level, it has been shown that 9q should be estimable at the usual root-ra rate of convergence 
using similar techniques as deriving the asymptotics of maximum likelihood estimators. In this 
article examination of the approximation results of the derivatives (up to the 2nd order) of the 
risk function in (|2.3p by their empirical versions implies that a range of smoothing parameter 
is allowed for the desired asymptotics; see Appendix A. This differs from nonparametric curve 
estimation in in which the optimal choice of the smoothing parameter is required to 

achieve the optimal rate of convergence. 

4 Algorithm 

In this section, the actual procedure is described to implement the estimation of 9$ and t y . I 
first introduce some new notation. For any fixed 9, write Pe,s = s (b^ s ~W s Bq^ B^ S W S 
as the sample projection matrix onto the cubic spline space. For any q = l,...,d, write 
B q = tj|-B0, P q = gf-Pfl as the first order derivatives of Bg and Pe jS with respect to 9. 
Write 9_ d = ...,9 d _ 1 ) T . Let S*(9^ d ) be the score vector of the risk function R* (0_<f) = 

R ^01, 02j 9d-x, \A — ll^-dll^ ' that is, S*(9_d) = W~^R* (Q-d)- The next lemma provides 
the exact form of S*(9_ d ). 

Lemma 4.1. For S*(9_ d ), the score vector of R* (9-d), one has 

S* (9- d ) = -n- 1 {y^P q y s - WV^PdY-}^ , (4-1) 

where for any q = l, d, yJP ? y s = 2yJ (I - P e>s ) B q (B^ s W s B e ^ B^ s W s y s , and B q = 



\ J N {B jj3 (Z0 i) - Bj+1,3 (Z0,i)} F d (X0,i) %i,q \ 



, with 

-3,..., J 



d-l 



■ , . d „ r (d + i / x 2 \— „ 

F d ( x ) = —F d = i '—— — _ ( i _ ) I(\x\<a) 

y ' dx aT{(d + l) /2} 2 2 d V " 
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In practice, the estimation is implemented via the following procedure. 

Step 1. Standardize the auxiliary variables { x i}i^u N and find the radius a used in the CDF 
transformation {2.J$ by calculating the 100(1 — a) percentile of {||xj||} i6(7jv (a = 0.01,0.05 for 
example). 

Step 2. Find the estimator of On by minimizing R in h2.12\) through the port optimization 
routine in the technical report of It J, with (0, 0, ...,1) T as the initial value and the gradient 
vector S* in equation ^4. If d < n, one can take the simple OLS estimator (after standard- 
ization) for {yi,Xi} i£s with its last coordinate positive. 

Step 3. Obtain the estimator rhi o/m(xj) ; i £ Un, by applying formula \2.1J$ . 

Step 4. Calculate the sample design-based spline estimator of t y in \2.15\) . 



Remark 4.1. In Step 2, the number of interior knots is 



J = min < c\ 



„l/5.5 



,c 2 }, (4.2) 



where c\ and C2 are positive integers and [u] denotes the integer part of v. The choice of the 
tuning parameter c\ makes little difference for a large sample, and according to our asymptotic 
theory, there is no optimal way to set these c\ and C2- I recommend using c\ = 1 to save 
computing for massive data sets and C2 = 5, 10 for smooth monotonic or smooth unimodal 
regression as suggested by [37|. 



5 Empirical Results 

In this section, empirical results are provided to demonstrate the applicability of the pro- 
posed methodology. A computing package in R can be downloaded from the following website: 
http://lilywang.myweb.uga.edu/research.htm. Besides the spline single-index (SIM) estima- 
tors proposed in the article, I have obtained for comparison the performance of three other 
estimators: Horvitz-Thompson estimator (HT) in equation (jl.ip . linear regression estimator 



(LREG) without interaction terms in Chapter 6 of 28|, and spline additive estimator (AM) in 
[2j with degrees 1,2 and 3 and adaptive knots. The number of knots Jn for the spline SIM 
estimator is selected according to (I4.2p . 
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5.1 Simulated Population 



To illustrate the finite-sample behavior of the estimator i^diflb some simulation results are pre- 
sented. For the superpopulation model (jl.2p . the following six mean functions are considered: 

2-dimension (Linear): 
2-dimension (Quadratic): 
2-dimension (Bump 1): 
2-dimension (Bump 2): 
4-dimension (Sinusoid): 
10-dimension (Sinusoid): 



mi (x) 


= X\ + x 2 


m 2 (x) 


= 1 + (xi + x 2 ) 2 


m 3 (x) 


= xi + X2 + 4 exp < 


ni4 (x) 


= x\ + X2 + 4 exp < 


m 5 (x) 


= sin(7rx T 6>o), #o = 


m 6 (x) 


= sin(7rx T 6'o), 9q = 



Xl + x 2 ) 



V3 

lf/V3 



These represent various correct and incorrect single-index model specifications. Function mi is 
a simple linear additive function with two auxiliary variables, and it is also a linear single-index 
function; Functions m 2 , m.3, 1715 and mg are some very common single-index models, but unlike 
mi, they are not additive so that the purely linear or additive model would be misspecified. 
Function m.4 is neither a genuine single-index nor a genuine additive function so that any of the 
above models would be misspecified. However, because the single-index model in this article 
is identified by the best approximation (see equation (|1.2H ) to the multivariate mean function, 
the estimator ty^iS is expected to be robust in this case. 

The auxiliary vector {xj} iel/jv is generated from i.i.d uniform (0, 1) random vectors. The 
population values y^s are generated from the mean functions by adding i.i.d N (0, <r 2 ) errors 
with a = 0.1 and 0.4. The population is of size N = 1000. Samples are generated by 
simple random sampling using sample size = 50,100 and 200. For each combination of 
mean function, standard deviation and sample size, 1000 replicates are selected from the same 
population, the estimators are calculated, and the design bias, design variance and the design 
mean squared errors are estimated. 

Table Q] lists the average mean squared errors (AMSE) of the spline estimators 6 in (|2.13j) 
based on d dimensions 

(5.1) 



1 

AMSE (o) = ~Y^ MSE fa) , 

9=1 



from which one sees that, even for small sample size, the estimators are very accurate for all 
the population models, and the precision is improved when sample size nj^ increases. 
In terms of the design biases, the percent relative design biases 

{Ep[iy,diff]-ty} /ty X 100% 

defined in til have been measured for all the above models. It is found that the relative 
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Table 1: 


AMSE of the spline estimators 


9 defined in 






a 


n N 


1 


2 


3 


4 


5 


6 




50 


0.00076 


0.0005 


0.00027 


0.00157 


0.00291 


0.00482 


0.1 


100 


0.0004 


0.00027 


0.00014 


0.00071 


0.0013 


0.00226 




200 


0.00019 


0.00016 


0.00008 


0.00038 


0.0007 


0.00124 




50 


0.01732 


0.00326 


0.00504 


0.01981 


0.00958 


0.01427 


0.4 


100 


0.00819 


0.00177 


0.00257 


0.00879 


0.00453 


0.00696 




200 


0.00423 


0.00089 


0.00129 


0.00398 


0.00233 


0.00372 



a Based on 1000 replications of um simple random samples from population of size N = 1000. 

design biases of the SIM estimators are quite small (less than one percent for all cases in the 
simulation) even for sample size = 50. 

Table [2] shows the ratios of design mean squared errors (MSE) for HT, LREG and AM 
estimators to the MSE for the proposed spline SIM estimator. From this table, one sees that 
the model-assisted estimators, LREG, AM and SIM estimators, perform much better than the 
simple HT estimators regardless the type of mean function and standard error. For mi, LREG 
is expected to be the preferred estimator, since the assumed model is correctly specified. The 
AM and SIM estimators have similar behavior in this case, and the MSE ratios of AM to SIM 
are close to 1. However, not much efficiency lost by using SIM and AM instead of LREG. The 
MSE ratios of LREG to SIM are at least 0.78 for all cases. For the rest of the population, 
the SIM estimators perform consistently better than LREG and AM estimators because the 
interactions between the auxiliary variables have been completely ignored for LREG and AM 
estimators. For 1714, it is not a genuine single-index function, but SIM estimators are still 
much more accurate than HT, LREG and AM estimators, confirmative to the theory that the 
proposed estimators are robust against the deviation from single-index model. 

To see how fast the computation is, Table [2] provides the average time (based on 1000 
replications) of obtaining the SIM estimators on an ordinary PC with Intel Pentium IV 1.86 
GHz processor and 1.0 GB RAM. It shows that the proposed SIM estimation is extremely fast. 
For instance, for Model 6, the SIM estimation of a 10-dimensional sample of size 200 takes 
on average 0.23 second. I have also carried out the simulation with sample size n^r = 5000 
generated from the population of size 50000. Remarkably, it takes on average less than 8 
seconds to get the SIM estimators for all the above models. 



5.2 MU281 data 



The MU284 data set from Appendix B of 28j contains data about Swedish municipalities. 
The study variable y is RMT85 x 10 -3 , where RMT85 is municipal tax receipts in 1985. Two 
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Table 2: Ratio of MSE of the HT, LREG and additive model-assisted estimators (AM) to the 
single-index model-assisted estimators (SIM) and the average computing time of the SIM. a 









MSE Ratio 


Time of SIM 


Model 


a 


n N 


HT 


LREG 


degree= 1 


AM 

dcgrcc= 2 


degree = 3 


(seconds) 






50 


12.40 


0.78 


0.97 


1.13 


3.05 


0.11 




0.1 


100 


14.10 


0.88 


0.97 


0.96 


1.02 


0.12 


1 




200 


14.29 


0.90 


0.92 


0.92 


0.95 


0.17 




50 


1.15 


0.82 


1.02 


1.19 


3.93 


0.14 




0.4 


100 


1.74 


0.92 


1.01 


1.01 


1.07 


0.13 






200 


1.72 


0.95 


0.98 


0.97 


1.01 


0.18 






50 


35.32 


2.75 


2.63 


3.15 


7.32 


0.11 




0.1 


100 


41.46 


3.48 


2.67 


2.84 


3.19 


0.12 






200 


43.78 


4.44 


2.75 


2.75 


2.82 


0.18 


2 




50 


4.03 


1.03 


1.17 


1.36 


4.11 


0.11 




0.4 


100 


4.62 


1.20 


1.18 


1.20 


1.31 


0.12 






200 


4.65 


1.35 


1.17 


1.16 


1.20 


0.18 






50 


34.47 


4.24 


4.72 


5.75 


15.53 


0.16 




0.1 


100 


39.23 


5.38 


4.87 


5.14 


5.56 


0.19 






200 


42.20 


6.82 


5.22 


5.30 


5.36 


0.32 


3 




50 


2.98 


1.09 


1.30 


1.53 


5.28 


0.16 




0.4 


100 


3.36 


1.23 


1.26 


1.28 


1.38 


0.19 






200 


3.80 


1.42 


1.30 


1.30 


1.33 


0.30 






50 


10.13 


2.88 


2.68 


3.24 


8.17 


0.16 




0.1 


100 


11.01 


3.50 


2.57 


2.68 


2.86 


0.17 






200 


12.67 


4.63 


2.83 


2.86 


2.88 


0.27 


4 




50 


1.59 


1.03 


1.19 


1.40 


4.72 


0.17 




0.4 


100 


1.80 


1.16 


1.14 


1.15 


1.22 


0.19 






200 


2.10 


1.34 


1.16 


1.16 


1.19 


0.30 






50 


18.73 


3.51 


5.56 


8.83 


10.79 


0.15 




0.1 


100 


25.16 


4.42 


4.82 


5.43 


6.67 


0.14 






200 


29.41 


4.97 


4.64 


4.82 


5.02 


0.21 


5 




50 


2.54 


1.11 


1.97 


3.43 


17.4 


0.12 




0.4 


100 


3.08 


1.28 


1.53 


1.63 


1.95 


0.14 






200 


3.43 


1.30 


1.39 


1.41 


1.47 


0.21 






50 


8.33 


1.63 


9.86 


7.20 


5.26 


0.15 




0.1 


100 


13.39 


2.21 


4.41 


5.73 


10.53 


0.15 






200 


18.55 


2.90 


3.22 


3.56 


4.09 


0.23 


6 




50 


2.08 


0.99 


6.16 


4.59 


3.11 


0.17 




0.4 


100 


2.63 


1.09 


2.15 


3.01 


5.06 


0.16 






200 


3.20 


1.21 


1.44 


1.59 


1.82 


0.23 



a Based on 1000 replications of simple random sampling from population of size N = 1000. 
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Table 3: Spline estimators 9 on MU281 Data 



n N 9 


MEAN 


BIAS 


SD 


MSE 


AMSE 




0i 


0.8343 


-0.0069 


0.0643 


0.0468 




50 












0.0507 




62 


0.5395 


-0.0013 


0.0940 


0.0546 






Oi 


0.8412 


0.0001 


0.0359 


0.0465 




100 












0.0480 




2 


0.5365 


-0.0042 


0.0563 


0.0495 





Based on 1000 replications of simple random sampling from population of N = 281 Swedish Municipalities. 



auxiliary variables x\ (CS82) and X2 (SS82) are used, where x\ is the number of Conservative 
Party seats in the municipal council, and x% is the number of Social Democrat Party seats. 
The largest three cities according to the variable population in 1975 (pop75) are discarded 
because they are huge outliers and would be treated separately in practice. The population 
total of N = 281 Swedish Municipalities, t y , is found to be 53.1510. The "oracle" estimator 9 
(see (gSP) at the population level is found to be (0.8412, 0.5406) T . 

A Monte Carlo simulation is carried out in which 1000 repeated SRS samples (each with n = 
50 and 100) are drawn from the MU281 population of Swedish municipalities. To demonstrate 
the closeness of the spline estimator 9 to the "oracle" index parameter 9, Table [3] lists the 
sample mean (MEAN), design bias (BIAS), design standard deviation (SD), the design mean 
squared error (MSE) and the AMSE in (|5.ip of 9. From this table, one sees that the sample- 
based estimators 9 are very accurate even for sample size 50. As what is expected, when 
the sample size increases, the coefficient is more accurately estimated. Table U] shows the 
performance of the HT, LREG, AM and SIM estimators of t y . One sees from this table that 
the model-assisted estimators are much more accurate than the simple HT estimators. Among 
all the model-assisted estimators, the spline SIM estimators are better than other estimators 
in terms of the MSE. 

Table H] shows the performance of the HT, LREG, AM and SIM estimators of ty. One sees 
from this table that the model-assisted estimators are much more accurate than the simple 
HT estimators. Among all the model-assisted estimators, the spline SIM estimators are better 
than other estimators in terms of the MSE. 
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Appendix A. Preliminaries 

Let matrices 

V e = ^BjB,,D e = lBjy. 
The following lemma provides the uniform upper bound of IIVT 1 ! 



(A.l) 



Lemma A.l. Under Assumptions (Al)-(A^), there exist constants < cy < Cy such that 
with ^-probability 1 

cyJ^ 1 || w||| < w r V<?w < CyJ^ 1 || w || 2 • 
Consequently, there exists a constant C > such that with ^-probability 1 

sup P^W^KCJn. 



(A.2) 



The result follows directly from Theorem 5.4.2 of U|] and Assumptions (A1)-(A4), thus 
omitted. 

In the following, let S^~ l be a cap shape subset of SV -1 , 



St 1 



h,...,e d )\J2G 2 q = i,o d >c\ ,ce (o,i) 

3=1 



Clearly, for an appropriate choice of c, 9q G , which I assume in the rest of the article 
Lemma A.2. Under Assumptions (Al)-(A5), for k = 0, 1,2 



sup sup 

06 s^i^e[o,i] 



Qk Qk 



(A.3) 



where (po and (pQ are given in 112. 8\) and 112.11]) . 
Proof. First we show the case when k = 0. Let 

1 rp 



1 

N 



BLW s y S) 



be the sample version of matrices V# and D# in (lA.lj) . then 

(z) - e (z) = B T (z) V^D^ - B T (z) V^Bg = ( (V fl , w , D e>7r ) , 
which is a nonlinear function of the following ir estimators: 



(A.4) 



\ ^2 B ^ A B i'^ de ^j = B i< A ( Zdi "> Vi l ni > 
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the components of Ve.n an d De^, respectively. Thus the fist order derivative of £ in (|A.4|) 
with respect to vg j7r jj> and <i6>,7r,j can be written as 

dv8,n,jj' 



B T (z) (-V^A.yV^J D^, -3 < j < f < J, 
B T (z)V e l r X J ,-3<j<J, 



where Xj is a ( J + l)-vector with the jth. component equal to one, zeros elsewhere; and Ajj> 
is a ( J + 1) x ( J + 1) matrix with the value 1 in positions and and the value 

everywhere else. 

Denote the components of Vg and Dg by 

v s,jj' = ^'' 4 ^'> 4 ' = ^'' 4 Vh 

respectively. Using the Taylor linearization, one can approximate the function £ in (|A.4p by a 
linear one, i.e. 

,/ 

C (V e , ff , D 0i7r ) = BT 0) V e (*,^ " 

-3<i<i'<J 
where the remainder term 

J 

RiN = fie (z) - fie (z) - ^ B T (z) V^Xj (d e ^j - dg :j ) 

J=-3 

+ Yl BT ^ ( v * V) (v e ,n, jf - v e , 3f ) ■ 

-3<j<j'<J 



Note that 



B T (z) V, (de^j - de tj ) = B T (z) V, 1 £ Xj (d e ,^ 



J=-3 



and 



-3<j<j'<J 



-T 



i€U N 



bT ( z ) v e' { £ A n' B iA M B?,4 (zed ) X D 6 

3<j<j'<J 



h 

— - 1 



1 £ [B r ( Z ) v^bj^v^] (I -ii. 
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Thus 



sup 



sup 

*e[o,i] 



B T (z)V e 1 X,(d e 



O p [n N 



1/2 



sup sup 

6eS d-i ue[o,i] 



-3<j<j'<J 



J J' 



v O,jj' 



■n 



-1/2 



N 



Similarly, one can show that 



sup sup \Rin\ = o p [n N 



1/2 



Thus, (|A.3p holds when k = 0. Next according to (|A.8|) in Lemma [A. 41 the corresponding order 
on the right hand side of (|A.3p will increases by h~ k when one takes the kth order derivative 
of B#. Similar arguments as given above yields the desired results for k = 1 and 2. □ 



Lemma A. 3. Under Assumptions (A1)-(A5), with p "-probability 1, one has 



and 



sup 



lim sup 



R{6)-R{6) 



8 k ~ 



(A.5) 



(A.6) 



where R{6) and R{9) are the population-based and sample-based empirical risk functions oft 
defined in M and 121M . 



Proof. Let 



A 



VI 



A 



N2 



A 



A' 3 



sup 



sup 



sup 



n- 1 E 

ieu N 



I 2 / li 



{Vi - fie (z0i)Y I — - 1 
2A r ~ 1 Y] {fie (zei) ~ fie Ofli)} {yi - fie (zoi)} ~ 
N- 1 V -&(*fc)} 2 - 

* IT ■ 



ieC/jv 



Noting that 



i?(^)- J R(^) = iV- 1 £ 



VT 7 ; 
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one has 



sup 



R(6) -R{9) 



< A m + A N2 + A N3 . 



Similar arguments as in the proof of Theorem [2] entitles that Ajfi converges to zero with p- 
probability 1 as N — > oo. For An3, using the similar arguments as in Lemma I A. 2\ one has 
with p-probability 1 



A 



iV3 



sup 



N- 1 V {w(z ei )-0e(zei)} 2 - 



ieU N 



< sup sup 



N- 1 {09 («) - 08 (u)Y 

i€U N 



In terms of A^ 2 , note that 



A N2 < sup < 



x sup 



< 2A]Ll sup { E } 



2 H_ 

7T,: 



8eS° 



2N ~ 1 ^ e C 8 *) ~ & 

i&U N 

2N- 1 {m-0e(z di )} 2 - 



1/2 



1/2 



1/2 



The definition of c/3g in (|2,8p implies that 



lim iV 1 V] {y; - <^0 (z ei )} 2 < oo. 



Thus -Ajv2 converges to zero with p-probability 1 as N —* oo, and (|A.5|) is proved. 
Next note that 



ieu N 



09' 



ieu N 



7T, 



"A r_1 ^ I 08 (Z0i) -^08 (Z8i) ~ 08 {zQi) J^08 (Z0i) 



iec/jv 



h 



d 

-N' 1 ^2 0e (zei) —0 e (z d i 
ieu N 



TTI 



1 • 
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According to Lemma I A. 2\ the first and the third terms are of the order O p (j^ +fe /n]/ 2 



p \ U N I N 

Using similar arguments as in Lemma 2 of [1], it can be shown that Jg</3e (zgi) is bounded if 
Assumption (A8) holds, the second and the fourth terms are of the order O p {n^ 2 ^ ■ Thus 
the desired result in (|A.6|) holds. □ 

Lemma A. 4. Under Assumptions (A2), (A4) and (A5), there exists a constant C > such 
that with ^-probability 1 



sup sup 



_d_ 



< CJ N . 



(A.7) 



Proof. Note that for any 1 < p < d 




Bp = W p Be 



{B j<3 (U e ,i) - B j+1>3 (U e>i )} F d h-' L X K 



n, N 



i=l,j=-3 

where h is the length of the neighboring knots. For any vector a 6 R n , with probability 1 



(A. 



\n 1 HTa\\ < ||a|l max 

I II do - H Hoo _ 3 < i < A r 



n 



1 E B i> A 



< CTillal 



oo ' 



< I a|l max 

-3<i<iv 



1 n 

E K^i,3 - B j+lj3 ) {U e ,)}F d (Xe ti ) X ia 



8=1 



< C ||a.|| 



Thus 



sup ||n 1 Bj|| < Ch, sup sup 



n B p 



< C, a.s.. 



Observing that 

P p = (I - P fl ) B p (B^B e ) _1 Bj + B e (BjB e ) _1 B^ (I - P 6 
one only needs to combine (|A.2|) . (|A.9|) . and (jA.lOp to prove (|A.7|) . 



(A.9) 

(A.10) 
□ 



Appendix B. Proof of Theorems [TM] 
B.l Proof of Theorem [TJ 

Let (p,,A, V) be the design probability space with respect to the sampling design measure. 
By Lemma IA.31 for any 5 > and uj G £1, there exists an integer Nq(uj), such that when 
N > N (uj), R (§,oj) - R (e^j < 5/2. Note that 9 = 9 (u) is the minimizer of R(0,u), so 

R (j) (u) ,uj — R < 5/2. Using Lemma IA.3I again, there exists N\ (uj), such that when 

N > Ni (a/), R (§ (to)) -r(§ (uj) , uj) < 5/2. Thus, when N > max (jV (uj) , N t (to)), 



R(6(u)) - Rid) <5/2 + R(6(uj),uj) - R[9) <5/2 + 5/2 = 5. 
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By Assumption (A7), for any e > 0, if R \9 (u) , uj — R [9 J < 5, then one would have 

(cj) — 9 < £ for N large enough, which is true for any u, and the strong consistency holds. 
Next, note that 



dR (9) 


dR (9) 


d 2 R (9) 


89 


de 

9=9 


, dede T 

9=9 



with 9 = t§ + (I - t)9. So 



/ d 2 R{9) 


\ 1 dR (9) 


1 d9d9 T 


J d9 

9=9/ 



where according to (IA.6jl and the above consistency result of 9, one has 



d 2 R{9) 



N^L d9d9 T 



d 2 R{9) 



d6d9 T 



in probability p, and by (IA.6j) in Lemma IA.3( one has 



&R(0) 
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< sup 

9&Si~ 



dR (9) dR (9) 



de 



de 



O p {jN/nJj 2 ^ . 



Thus 



Op (Jn/it 1 ^ 2 ) by Assumption (A8). 



B.2 Proof of Theorem d 

Lemma B.l. Under Assumptions (Al)-(A5) and (A7) one has 



1 

lim —E v 



y~] ifhi - fhi) 
ieU N 



0, 



where fhi and fhi are defined in \2. 9\) and \2.1J$ . 

-l 



Proof. Let nii = ej ( B^ B^ ) B^ y, then one can write 

2 



^ (m ~ rhi) 2 = ^ [fhi - rmj + ^ ( ///, - in, 

ief/jv ieCjv i&Un 

+2 ^ [rhi ~ fh^j [rhi - fhi 

i&U N 



According to Lemma I A, 2\ jjE p 



J2 ie u N [mi -m 



0, so it suffices to show 



Em 

N 1 



i in, - in, 



(B.l) 
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Let / (t) = efP t§+{1 _ t)§ y, then 



df(t) d 



dt 



q =i uu i 



Therefore, 



mi — mi 



efV§ (bJ^) 1 B Jy " eJ*e ( B J B e) B Jy 
/ (1) - / (0) =efJ2 QfVr>e^-t*)0 {** ~ **) y» 

g=l 9 



where i* G (0, 1). Thus 



— E n 

N p 



Note that according to Theorem [TJ with p-probability 1, 



^ 9 *>*+(!-*• 



and 



Op ( Jn /n^Jj 2 ) • By Lemma lA, 41 there exists a positive constant Cq such that 



p 



< CqJn with ^-probability 1. Thus (jB.ip follows directly from 



the above arguments and Assumption (A4). Hence the result. 
Note that 

h 



□ 



ty,diS — t y = ^ (yi 



mi 



S" 1 +.E< 



m» - m,;) |1 



Then 



^j/,difT ^j/ 



+ < E v 



N 



E 



< En 



(Vi - rhi) 



7Tj 



{mi - m^ 
N 



En 



E 



(l - ii/nY 
N 



1/2 



(B.2) 



According to the definition of (|2.7p . under Assumptions (A1)-(A4), one has 



limsup — V (y. 

7V->oo iV ~ 



Wi < oo. 
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Following the same argument of Theorem 1 in [l| , the first term on the right hand side of (|B.2|) 
converges to zero as N — » oo. For the second term, Assumption (A5) implies that 



E n 



E 

ieu N 



(i - h/KiY 

N 



y-v (1 ~ 7Tj) < 1 



ieU N 



Nixi 



According to Lemma IB, II 



lim — E n 

ieU N 



[mi - mi) 



with ^-probability 1, 



and the result follows from the Markov's inequality. 



B.3 Proof of Theorem [3] 



The next lemma is to derive the asymptotic mean squared error of the proposed spline estimator 
in (|2TT5D . 



Lemma B.2. Under Assumptions (A1)-(A5) and (A7) 

2 

/ / ,,,,r — I . \ 

n^Ep 



N 



i,jeu N 



+ o(l). (B.3) 



Proof. Note that 
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iV 
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N V 7T. 



— - 1 



Let 



then 



1/2 \ - Vi — rlii ( Ii \ j o 



iec/jv 



1/2 \ - - mj 

n 2^ jy 



ieu N 



Ii 

1 - — 



AT2 



7Tjj — 7ri7rj 



_ I - + I -T7 2^ W ~ m «) < °°> 



A 



A 2 



< 



i,jeu N 



+ 



A 2 



AT P 



{Ai - rniY 



27 



By Lemma [B. 11 one has E p [tipA = o (1) and Cauchy-Schwartz inequality implies E p [a n b n ] = 0. 
Therefore 

ty ' diS N ~ ty \ = E p [a 2 n ] + 2E p [a n b n ] + E p [b 2 n ] = E p [a 2 n ] + o (1) . 



Thus the desired result holds. 
Denote by 

AMSE (iV- 1 ^) = — ^ (f* _ mi ) (fi " m ^ 



□ 



7Tjj - 71^ 



as the asymptotic mean squared error in (|B.3P . The next result shows that it can be estimated 
consistently by V (iV~ 1 i J/j diff) hi (13.11) . 



Lemma B.3. Under (A1)-(A7), one has 
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AT-+oo F 
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Proof. Denote 



52 = ± £ (m-m)^-^) 

53 = at2 Yl {m - m) (rhj - rhj) 



then 



V {N-H ym ) - AMSE {N-Hy^s) = 5i - AMSE {N-% m ) + S 2 + S 3 
For the first term Si, one has 
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and 



i,jeU N 



N 4 



i,j,k,leU N 
TTij — TTiTTj \ / TTkl — 7Tjfc7T; 
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which can be represented as 
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and limsup^Y^^ Ylieu N (v* ~ m *) 4 < 00 • Thus sitv goes to zero as iV — ► oo. Next 
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which converges to zero as A —* oo by Assumption (A6). As a result of Cauchy-Schwartz 
inequality, one can show S2N goes to zero as A — ► oo. Therefore 

n N E p \Si - AMSE (JV r-1 ^, )diff ) | -> 0, as A -> oo. (B.5) 

Next for S2, by Lemma IB. II 
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For 53, applying Lemma fB . 1 1 again, one has 
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The desired result follows from (|B.4j) - f)B.7|) . 

Proof of Theorem [3j By the proof of Lemma IB. 21 
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N-i(i y , m -t y ) = a-ME^+E^-E^ 
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ty,diS — ty) + °p [ n N 



1/2 



so the desired result follows from Lemma IB. 31 



□ 
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